Cleaning

Exploratory analysis: What’s in the AFT corpus?

Tale Types

  • Proportion of ATU represented by aft

By ATU Chapter/Division

Summary stats by ATU chapter:

chapter n_types n_tales pct_with_tales tales_per
TALES OF MAGIC 240 406 15.8 10.7
OTHER ANIMALS AND OBJECTS 50 64 14.0 9.1
RELIGIOUS TALES 543 277 5.9 8.7
OTHER TALES OF THE SUPERNATURAL 36 41 13.9 8.2
ANIMAL TALES 332 266 10.8 7.4
ANECDOTES AND JOKES 993 267 4.2 6.4
FORMULA TALES 53 52 18.9 5.2

The treemap below shows the nested sets of the ATU into which AFT texts fall, by chapter, division, and sub_division.

Content

Entities

Common phrases

  • TextRank
  • collocation/word frequency

Topic modeling

  • Define cleaning tasks and stop words to improve topic models performance; right now they are too close together, with a few main clusters of topics that are difficult to distinguish